Thirty Days of Metal — Day 10: 2D Math
This series of posts is my attempt to present the Metal graphics programming framework in small, bite-sized chunks for Swift app developers who haven’t done GPU programming before.
If you want to work through this series in order, start here.
Points and Vectors
So far, we’ve made the simplifying assumption that all vertex positions are expressed in normalized device coordinates (NDC), which conveniently happens to match what Metal expects from the vertex shader.
If we eventually want to do useful work in three dimensions, though, we need to understand how to move between different coordinate systems. We’ll start by distinguishing points from vectors and discovering the basic operations we can perform on them.
Suppose we have some points.
Okay, that’s not especially useful. We don’t know where these points are, or how far apart they are, because we haven’t established a coordinate system. Let’s do that and give them some labels.
Much better. Now we can see that the origin of the coordinate system is in the middle of the space, and the points are each about one hundred units away from the origin.
In addition to points, we need to get familiar with vectors. Whereas a point is a location in a space, a vector is a displacement in a space. This means that a vector’s coordinates are the same regardless of “where” it is in space.
Point and Vector Arithmetic
This point-vector distinction is crucial, because it allows us to define sensible arithmetic operations between points and vectors. For example, we can move from one point to another by adding a vector to the initial point. Here, the point a is displaced by the vector b to arrive at the point c.
It is also meaningful to subtract one point from another. The result is a vector that points from the first point to the second point. This follows immediately from rearranging the equations above.
Adding two vectors also makes sense. We do this geometrically by placing the “tail” of the second vector at the “tip” of the first vector. The displacement from the tail of the first vector to the tip of the second vector is then the displacement of their sum, which is yet another vector.
We can “convert” from a vector to a point by adding the vector to the origin. Since we already know that adding a vector to a point is valid, and the origin is just a point, we expect their sum to be a point that is displaced from the origin according to the vector.
Translating Points
Let’s connect our points together into a triangle so we can see how various operations affect our shapes.
Adding a constant vector to each point of a shape moves the shape in space. This is called translation.
In the figure above, we’ve removed the origin, axes, and labels to emphasize the geometric nature of translation, but if we actually wanted to perform the translation in code (as we will later), we’d have to write down the points and vector in a particular coordinate system, then carry out the addition.
Scaling Points
We have looked at several examples of addition and subtraction among points and vectors, but what happens when we multiply a point by a scalar (a single number)?
We can gain some intuition for this by considering how numbers on a one-dimensional number line move when multiplied: a multiplicative factor of greater than one causes points on a number line to move away from the origin (proportionately to their distance from the origin), and a multiplicative factor of less than one causes points to move toward the origin.
Likewise, if we multiply points by a scalar, they move toward or away from the origin. We call this process scaling.
If we consider our friendly triangle again and imagine multiplying its points by some factor like 1.2, we can observe scaling in action.
The vertices of the triangle are still roughly uniformly distributed about the origin because the center of the triangle nearly coincides with the origin.
If we scale a figure that is not centered around the origin, the points still scale proportionately, but rather than appearing to scale about the origin, they seem to scale away from it.
This illustrates that the coordinate space we model our shapes in matters. For example, if we were designing a bipedal character, we might place the origin of its local coordinate system between its feet, so that when it’s scaled up, it would scale away from the ground plane while its feet remain planted. Other times, it’s more convenient to place the origin at the centroid, or average, of the points. It all depends on what makes it easiest to compose objects together in a larger virtual world.
Rotating Points
Translation and scaling are two examples of transformations. A transformation is a way of moving or reshaping a figure.
The last fundamental transformation we will talk about is rotation.
In two dimensions, there is only one axis that we can rotate around: the z axis. The z axis is perpendicular to both the x and y axes; it projects “out of the screen” toward us. Rotating around the z axis by a particular angle makes points orbit around the origin.
Rotation can seem intimidating at first because it involves trigonometric functions, but if this is your first time seeing these formulas, you shouldn’t worry about memorizing or understanding them. We will package up the math into utility functions that just do the right thing. We’re developing a vocabulary of transformations here, not taking a math class.
Rotating a figure that is not centered on the local origin has the effect of making the entire object move around the origin rather than its own centroid. This is often what we want. Think of a camera orbiting around a scene in “bullet time”: the camera’s path through space is an arc focused on the center of the action.
Composing Transformations
We have talked about three fundamental types of transformations: translation, rotation, and scaling. The formulas for these transformations each look quite distinct. Translation is just vector addition, scaling is multiplying by a scalar, and rotation is a linear combination of vector components multiplied by trigonometric functions.
Suppose we want to chain different transformations together. One extremely common sequence of transformations is: scale, then rotate, then translate. We could write down an equation that does all three like this:
This certainly would work. But sometimes we want to chain together arbitrary transformations in arbitrary orders. It would be nice not to have to write different code every time we want a different sequence.
Perhaps there’s some way to write down all of these different transformations as the same kind of thing…
Enter the Matrix
It turns out that we can write scaling, rotation, translation, and many other kinds of transformation as matrices.
First of all, though, what is a matrix?
Just as a scalar is a single number and a vector is a one-dimensional list of numbers, a matrix is a two-dimensional array of numbers. That’s it. All of the power of matrices comes from the mathematical rules we apply to them.
A matrix with N rows and M columns is called an N×M matrix. We identify an individual element in a matrix with a subscript pair that indicates its row and column. For example, in the matrix A below, element a02 is in the first row (row 0) and the third column (column 2).
Matrix Multiplication
It turns out that the only rule we need to unify transformations together in the language of matrices is matrix multiplication. Matrix multiplication isn’t quite as simple as scalar or vector multiplication, but it follows simple rules.
Suppose we want to multiply two matrices, B and A, forming a matrix product,BA. For two matrices to be compatible, the number of columns in the left-hand matrix must be equal to the number of rows in the right-hand matrix. B and A are both 3×3 matrices, so they are compatible with this rule.
Each element in the product matrix is found by taking the corresponding row from the left matrix and the corresponding column from the right matrix, multiplying them together pairwise, and summing up the results.
As an example, to find the element in the second row and third column of the product BA, we take the second row of B and the third column of A, multiply each pair of their respective elements, and add up the products.
Carrying out this procedure for every element gives the complete matrix product.
Expressing Transformations as Matrices
For reasons that are out of the scope of this article, it takes an (N + 1)×(N + 1)-dimensional matrix to represent scale, rotation, and translation in an N-dimensional space, so we will be working with 3×3 matrices in this section.
We turn a transformation formula into a transformation matrix by placing the multiplicative factors and translational component for the x coordinate in the first row, and the factors and translation for the y coordinate in the second row. Any element along the diagonal that doesn’t already have a value is set to 1, while any such element off the diagonal is set to 0.
We can write scaling as a matrix like this:
where sx is the scale factor along the x axis, and sy is the scale factor along the y axis.
We apply a transformation to a point by treating the vector as a column matrix with dimensions 3×1 with a 1 in the bottom-most row. For example, to scale a vector with our newly-constructed 2D scaling matrix, we’d write this:
Note that the x coordinate is scaled by the x scaling factor, and the y coordinate is scaled by the y scaling factor, as intended.
We can formulate a rotation matrix using the same procedure as above, using the sin and cosine factors we learned about earlier:
By multiplying this matrix with a vector to the right, we recover our earlier rotation equations:
Finally, we can bring translation into this framework by placing the translation offsets into the third column.
When we perform matrix multiplication with a translation matrix, the final 1 in our point’s column matrix “picks up” the translational components, resulting in them being added to the x and y coordinates respectively:
This is exactly what we expect from translation.
Now we can write a single matrix that scales, rotates, and translates a point all at once.
With that, we’ve successfully established a mathematical framework for treating all basic transformations as matrices, which allows us to easily compose arbitrary sequences of transformations together via matrix multiplication.
Virtualizing the Canvas
Recall again that we’ve been writing all of our vertex positions in terms of normalized device space. This has been convenient because it means we can pass these positions directly through our vertex function. However, it is often not the most convenient space to work in.
Ordinarily, in a painting program like Adobe Photoshop, you define a canvas size that lets you specify how large the image should be in pixels. The image might later be scaled and zoomed in a different context, but these operations happen relative to the dimensions you originally chose.
We can do the same thing when drawing pictures with Metal. Imagine we have a window that is 800×600 points in size. Then, instead of working in a space whose x coordinates run from -1 on the left to 1 on the right, we might choose a space that runs from -400 to 400, for a total width of 800, matching our window.
Since the user might choose to resize the window, we need to decide how to react. The right answer depends on context, but one thing we could choose to do is scale the image to continue filling the view, while resizing it in the vertical dimension to avoid distorting shapes unnecessarily.
Whatever we decide to do, we need a way to transform from our chosen canvas space to NDC space.
Projection Transformations
The task of reshaping our virtual world to match NDC is called projection. Projection simultaneously transforms vertices into NDC and introduces perspective, if necessary. We won’t discuss perspective here just yet, but we do need to get comfortable with the idea of projection transformations.
The coordinates in the canvas space we chose above run from -400 to 400 in x and -300 to 300 in y by default. We can generalize this into a scheme where we specify arbitrary left, right, top, and bottom boundaries for our virtual canvas. Then, the task of formulating a projection transform amounts to scaling and shifting points on our canvas into normalized device coordinates.
Normalized Device Coordinates, in Depth
I haven’t been entirely forthcoming about the nature of NDC space. In addition to x and y axes that run from -1 to 1, NDC space also has a z axis that runs from 0 to 1, pointing away from us, the viewer. To formulate a projection transform that does the right thing, we need to account for this third dimension.
Fortunately, because we’re still drawing in 2D, we can mostly get away with ignoring the z axis. The z coordinate of all of our vertices will be forced to 0 by our vertex function, as we’ve been doing all along. So, all we have to do is pick a sensible range for z coordinates in our canvas space, and map it to NDC’s z axis with our projection transformation.
As you may have guessed by now, we’re going to build a matrix to do this transformation. I know I just introduced another dimension to our idea of NDC, but that implies that we also need to add another row and column to our transformation matrices, making them 4×4.
This isn’t as bad as it sounds: our rotation, scale, and translation matrices just get a few more ones and zeroes, following all of the same rules as before:
Orthographic Projections
With our existing transformation matrices patched up for the third dimension, let’s continue exploring projection.
We’ll be doing a particular kind of projection called “orthographic projection.” The term orthographic here means that our virtual scene is projected along parallel lines toward the viewing plane. This contrasts with perspective projection, in which projection happens along non-parallel lines, introducing effects like foreshortening. We will look at perspective projection when we start drawing in 3D.
An orthographic projection transforms each axis using the same procedure: scale the coordinates so that they span the width of NDC, then offset them so they’re centered on the origin. If that sounds like a combination of a scale matrix and a translation matrix, that’s because it is.
We write our orthographic projection matrix in terms of the left, right, top, and bottom boundaries of our canvas. For brevity, they’re abbreviated as l, r, t, and b below. We also have near and far boundaries in z, abbreviated n and f, which we’ll just set to 0 and 1 for now.
Matrix Math in Swift
Swift’s simd framework includes small matrix types that are ideal for expressing transformation matrices. In particular, the simd_float4x4 type is a 4x4 matrix that includes operators for performing essential operations like matrix-matrix multiplication and matrix-vector multiplication.
We will add extensions to the simd_float4x4 matrix type for each of our basic transformations. Note that the initializer we delegate to expects to receive column vectors rather than row vectors, so if things seem flipped, that’s why.
Here are our basic scale, rotate, and translate transformations in code:
extension simd_float4x4 {
init(scale2D s: SIMD2<Float>) {
self.init(SIMD4<Float>(s.x, 0, 0, 0),
SIMD4<Float>( 0, s.y, 0, 0),
SIMD4<Float>( 0, 0, 1, 0),
SIMD4<Float>( 0, 0, 0, 1))
} init(rotateZ zRadians: Float) {
let s = sin(zRadians)
let c = cos(zRadians)
self.init(SIMD4<Float>( c, s, 0, 0),
SIMD4<Float>(-s, c, 0, 0),
SIMD4<Float>( 0, 0, 1, 0),
SIMD4<Float>( 0, 0, 0, 1))
} init(translate2D t: SIMD2<Float>) {
self.init(SIMD4<Float>( 1, 0, 0, 0),
SIMD4<Float>( 0, 1, 0, 0),
SIMD4<Float>( 0, 0, 1, 0),
SIMD4<Float>(t.x, t.y, 0, 1))
}
}
We can also write an orthographic projection matrix extension using the formula from above.
extension simd_float4x4 {
init(orthographicProjectionWithLeft left: Float, top: Float,
right: Float, bottom: Float, near: Float, far: Float)
{
let sx = 2 / (right - left)
let sy = 2 / (top - bottom)
let sz = 1 / (near - far)
let tx = (left + right) / (left - right)
let ty = (top + bottom) / (bottom - top)
let tz = near / (near - far)
self.init(SIMD4<Float>(sx, 0, 0, 0),
SIMD4<Float>( 0, sy, 0, 0),
SIMD4<Float>( 0, 0, sz, 0),
SIMD4<Float>(tx, ty, tz, 1))
}Matrix-Vector Math in Shaders
Since we can combine multiple transformations (including projection) into a single matrix, adapting our vertex function to use transformations is very straightforward.
As before, we take a constant buffer parameter, but this type it is of type float4x4 &, a reference to a single 4x4 matrix.
We then rewrite our position transformation expression to use the more general transformation matrix:
out.position = transform * float4(in.position, 0.0, 1.0);The transform matrix here could have any number of transformations baked into it, and we’ll explore some time-based animations below using this flexibility.
Animating Transformations in Time
To conclude this article, let’s put everything into practice by modifying our updateConstants() method to generate a transformation matrix that combines animating scaling, rotation, and translation effects with our orthographic projection.
We start by defining a new member variable in our renderer to keep track of elapsed time.
var time: TimeInterval = 0.0We now rewrite our updateConstants() method in its entirety, starting with some timekeeping:
func updateConstants() {
time += 1.0 / Double(view.preferredFramesPerSecond)
let t = Float(time)The first effect we’ll introduce is a pulsating scale effect, which will grow and shrink the triangle around its center:
let pulseRate: Float = 1.5
let scaleFactor = 1.0 + 0.5 * cos(pulseRate * t)
let scale = SIMD2<Float>(scaleFactor, scaleFactor)
let scaleMatrix = simd_float4x4(scale2D: scale)Next, we apply a rotation that spins the triangle about its center:
let rotationRate: Float = 2.5
let rotationAngle = rotationRate * t
let rotationMatrix = simd_float4x4(rotateZ: rotationAngle)Finally, we will apply a translation effect that moves the triangle around the screen in a circle, much as we did last time when introducing constant buffers:
let orbitalRadius: Float = 200
let translation = orbitalRadius * SIMD2<Float>(cos(t), sin(t))
let translationMatrix = simd_float4x4(translate2D: translation)We combine these three matrices into a so-called model matrix, which transforms the triangle from its own local coordinate space into our virtual canvas space.
let modelMatrix = translationMatrix * rotationMatrix * scaleMatrixTo build our orthographic projection matrix, we calculate a canvas size that is 800 units wide and whose height adapts to the aspect ratio of the view. The center of the view coincides with the origin of our canvas space.
let aspectRatio = Float(view.drawableSize.width / view.drawableSize.height)
let canvasWidth: Float = 800
let canvasHeight = canvasWidth / aspectRatio
let projectionMatrix = simd_float4x4(orthographicProjectionWithLeft: -canvasWidth / 2,
top: canvasHeight / 2,
right: canvasWidth / 2,
bottom: -canvasHeight / 2,
near: 0.0,
far: 1.0)To get the final transformation matrix, we multiply our projection matrix by the model matrix of the triangle:
var transformMatrix = projectionMatrix * modelMatrixNow, using the dynamic constants technique from the previous article, we can send our transformation matrix to the GPU to be applied in the vertex function:
currentConstantBufferOffset =
(frameIndex % MaxOutstandingFrameCount) * constantsStride let constants = constantBuffer.contents()
.advanced(by: currentConstantBufferOffset) constants.copyMemory(from: &transformMatrix,
byteCount: constantsSize)
}
(Note that we have updated the constants size member to match the size of the transform matrix, MemoryLayout<simd_float4x4>.size.)
With these changes in place, we can run the app and watch our triangle pulse, rotate, and tumble around the screen.